Automated data mining runs

ABSTRACT

A data mining run that includes special analyses is triggered directly after having loaded new data in a data warehouse environment, to enrich the newly loaded data by new attributes. The process automates replicating transaction data from a source system into a data warehouse, triggering a data mining procedure (such as a training or a prediction procedure) that enriches the data with new attributes, and triggering the upload of the enriched data back into the data warehouse.

TECHNICAL FIELD

[0001] This description relates to loading and using data in a datawarehouse on a computer system.

BACKGROUND

[0002] Computer systems often are used to manage and process businessdata. To do so, a business enterprise may use various applicationprograms running on one or more computer systems. Application programsmay be used to process business transactions, such as taking andfulfilling customer orders, providing supply chain and inventorymanagement, performing human resource management functions, andperforming financial management functions. Data used in businesstransactions may be referred to as transaction data or operational data.Often, transaction processing systems provide real-time access to data,and such systems may be referred to as on-line transaction processing(OLTP) systems.

[0003] Application programs also may be used for analyzing data,including analyzing data obtained through transaction processingsystems. In many cases, the data needed for analysis may have beenproduced by various transaction processing systems and may be located inmany different data management systems. A large volume of data may beavailable to a business enterprise for analysis.

[0004] When data used for analysis is produced in a different computersystem than the computer system used for analysis or when a large volumeof data is used for analysis, the use of an analysis data repositoryseparate from the transaction computer system may be helpful. Ananalysis data repository may store data obtained from transactionprocessing systems and used for analytical processing. The analysis datarepository may be referred to as a data warehouse or a data mart. Theterm data mart typically is used when an analysis data repository storesdata for a portion of a business enterprise or stores a subset of datastored in another, larger analysis data repository, which typically isreferred to as a data warehouse. For example, a business enterprise mayuse a sales data mart for sales data and a financial data mart forfinancial data.

[0005] Analytical processing may be used to analyze data stored in adata warehouse or other type of analytical data repository. When ananalytical processing tool accesses the data warehouse on a real-timebasis, the analytical processing tool may be referred to as an OLAPsystem. An OLAP system may support complex analyses using a large volumeof data. An OLAP system may produce an information model using athree-dimensional presentation, which may be referred to as aninformation cube or a data cube.

[0006] One type of analytical processing identifies relationships indata stored in a data warehouse or another type of data repository. Theprocess of identifying data relationships by means of an automatedcomputer process may be referred to as data mining. Sometimes a datamining mart may be used to store a subset of data extracted from a datawarehouse. A data mining process may be performed on data in the datamining mart, rather than the data mining process being performed on datain the data warehouse. The results of the data mining process then arestored in the data warehouse. The use of a data mining mart that isseparate from a data warehouse may help decrease the impact on the datawarehouse of a data mining process that requires significant systemresources, such as processing capacity or input/output capacity. Also,data mining marts may be optimized for access by data mining analysesthat provide faster and more flexible access.

[0007] One type of data relationship that may be identified by a datamining process is an associative relationship in which one data value isassociated or otherwise occurs in conjunction with another data value orevent. For example, an association between two or more products that arepurchased by a customer at the same time may be identified by analyzingsales receipts or sales orders. This may be referred to as a salesbasket analysis or a cross-selling analysis. The association of productspurchases may be based on a pairing of two products, such as when acustomer purchases product A, the customer also purchases product B. Theanalysis may also reveal relationships between three products, such aswhen a customer purchases product A and product B, the customer alsotypically purchases product C. The results of a cross-selling analysismay be used to promote associated products, such as through a marketingcampaign that promotes the associated products or by locating theassociated products near one another in a retail store, such as bylocating the products in the same aisle or shelf.

[0008] Customers that are at risk of not renewing a sales contract ornot purchasing products in the future also may be identified by datamining. Such an analysis may be referred to as a churn analysis in whichthe likelihood of churn refers to the likelihood that a customer willnot purchase products or services in the future. A customer at risk ofchurning may be identified based on having similar characteristics tocustomers that have already churned. The ability to identify a customerat risk of churning may be advantageous, particularly when steps may betaken to reduce the number of customers who do churn. A churn analysismay also be referred to as a customer loyalty analysis.

[0009] For example, in the telecommunications industry a customer may beable to switch from one telecommunication provider to anothertelecommunications provider relatively easily. A telecommunicationsprovider may be able to identify, using data mining techniques,particular customers that are likely to switch to a differenttelecommunications provider. The telecommunications provider may be ableto provide an incentive to at-risk customers to decrease the number ofcustomers who switch.

[0010] In general, using data for special data analysis, such as theapplication of data mining techniques, involves a fixed sequence ofprocesses, in which each process occurs only after the completion of apredecessor process. For example, in a data warehouse that uses aseparate data mining mart for the performance of a data mining process,three processes may need to be performed in order. First, data must beloaded to a data warehouse from a transaction data management system.Second, data from the data warehouse must be copied to a data miningmart and the data mining process must be performed. Third, the enrichedor new data that results from the data mining process must be loaded tothe data warehouse. Each of those processes may be triggered separately,often by different users. As a result, the data mining process isperformed separately from the loading of the new data to the datawarehouse. In some cases, performing the data mining process may occurdays, or even weeks, after the data has been loaded from the transactionprocessing system and is available for analysis.

[0011] A delay in performing data mining analysis may be problematicwhen the results of the analysis are most useful at a particular time.For example, the value of a churn prediction for a particular customeror group of customers may be time-sensitive. After a customer purchasesa service or product elsewhere, the opportunity of the businessenterprise to influence the behavior of the customer is lost. When theidentification of a high likelihood of churning occurs after thecustomer has been lost, the data mining result is wasted.

[0012] Some aspects of creating or using a data warehouse may beautomated, that is initiated without user manipulation. For example, anautomated software agent may be employed to collect data from variousdistributed databases to collect data for a data warehouse. Using anOLAP system, a report or other type of output may be automaticallygenerated and sent to various receiving devices, such as a personaldigital assistant, a printer, or a pager. When transaction data is inputto a transaction processing system, the online transaction data may beautomatically summarized and stored as summary data.

SUMMARY

[0013] Generally, the invention automates the triggering of specialanalyses directly after having loaded new data in a data warehouseenvironment to enrich the newly loaded data with new attributes. Theinvention automates, without requiring user manipulation, copying datafrom a data warehouse to a data mining mart, the triggering of a datamining procedure (such as a training or a prediction procedure) thatenriches the data with new attributes, and the triggering of the uploadof the enriched data to the data warehouse. The invention also mayautomate, without requiring user manipulation, the loading oftransaction data from a source system into a data warehouse beforetriggering the data mining process. One area where the invention mayfind specific applicability is in performing a data mining procedure ona regular, predetermined basis. For example, sales receipts for aparticular month may be automatically loaded into a data warehouse andanalyzed for associative sales relationships. Another example isperforming periodic analysis of customer activity to identify customersthat are at risk of churning for the purpose of influencing customerbehavior.

[0014] In one general aspect, a data mining process may be automaticallytriggered. An analytical process is triggered based on the presence ofdata in a data source that is used for analytical processing. Theanalytical process is performed on data from the data source after theanalytical process has been triggered. The analytical process uses aprocedure that also is usable in a data extraction process. The createddata attribute is stored in the data source.

[0015] Implementations may include one or more of the features notedabove and one or more of the following features. For example, theanalytical process may be triggered based on the completion of acomputer program for loading data to the data source that is used foranalytical processing.

[0016] Also, data may be extracted from a data source used fortransaction processing and loaded to the data source that is used foranalytical processing. A person may initiate at most the step ofextracting data from the data source used for transaction processing orthe step of loading the extracted data. The occurrence of apredetermined date and time may trigger extracting the data or maytrigger loading the data.

[0017] In addition to extracting data from a transaction data source,data also may be extracted from the data source that is used foranalytical processing and loaded to temporary data storage. Theanalytical process may be performed on the data stored in the temporarydata storage.

[0018] The types of analytical processes that may be triggered includean analytical process to determine a relationship between two datavalues in the data source, or determine a relationship between two datavalues that predict a likelihood of whether a particular customer willfail to purchase a service or product in the future. The analyticalprocess also may apply a relationship that has beenpreviously-determined to data values in the data source. The analyticalprocess also may identify products or services that are purchased in thesame transaction. For example, the analytical process may determine thelikelihood of whether a particular customer will fail to purchase aservice or a product in the future. The likelihood may be based oncharacteristics associated with customers who have been identified asfailing to purchase a service or a product.

[0019] Implementations of the techniques discussed above may include amethod or process, a system or apparatus, or computer software on acomputer-accessible medium. The details of one or more implementationsof the invention are set forth in the accompanying drawings and thedescription below. Other features, objects, and advantages of theinvention will be apparent from the description and drawings, and fromthe claims.

DESCRIPTION OF DRAWINGS

[0020]FIG. 1 is a block diagram of a system incorporating variousaspects of the invention.

[0021]FIG. 2 is a block diagram illustrating the enrichment of datastored in the data warehouse based on an automated data mining run.

[0022]FIGS. 3 and 4 are flow charts of processes to automate a datamining process.

[0023]FIG. 5 is a block diagram of the components of a softwarearchitecture for automating a data mining run.

[0024]FIG. 6 is a block diagram of a process to use a data miningworkbench to design an automated data mining process.

DETAILED DESCRIPTION

[0025]FIG. 1 shows a block diagram of a system 100 of networkedcomputers, including a computer system 110 for a data warehouse andtransaction computer systems 120 and 130. The loading of new data to thedata warehouse 110 from the transaction computer systems 120 and 130triggers a special analysis to enrich the newly loaded data with newattributes.

[0026] The system 100 includes a computer system 110 for a datawarehouse, a client computer 115 used to administer the data warehouse,and transaction computer systems 120 and 130, all of which are capableof executing instructions on data. As is conventional, each computersystem 110, 120 or 130 includes a server 140, 142 or 144 and a datastorage device 145, 146 or 148 associated with each server. Each of thedata storage devices 145, 146 and 148 includes data 150, 152 or 154 andexecutable instructions 155, 156 or 158. A particular portion of data,here referred to as business objects 162 or 164, is stored in computersystems 120 and 130, respectively. Each of business objects 162 or 164includes multiple business objects. Each business object in businessobjects 162 or 164 is a collection of data attribute values, andtypically is associated with a principal entity represented in acomputing device or a computing system. Examples of a business objectinclude information about a customer, an employee, a product, a businesspartner, a product, a sales invoice, and a sales order. A businessobject may be stored as a row in a relational database table, an objectinstance in an object-oriented database, data in an extensible mark-uplanguage (XML) file, or a record in a data file. Attributes areassociated with a business object. In one example, a customer businessobject may be associated with a series of attributes including acustomer number uniquely identifying the customer, a first name, a lastname, an electronic mail address, a mailing address, a daytime telephonenumber, an evening telephone number, date of first purchase by thecustomer, date of the most recent purchase by the customer, birth dateor age of customer, and the income level of customer. In anotherexample, a sales order business object may include a customer number ofthe purchaser, the date on which the sales order was placed, and a listof products, services, or both products and services purchased.

[0027] The data warehouse computer system 110 stores a particularportion of data, here referred to as data warehouse 165. The datawarehouse 165 is a central repository of data, extracted fromtransaction computer system 120 or 130 such as business objects 162 or164. The data in the data warehouse 165 is used for special analyses,such as data mining analyses used to identify relationships among data.The results of the data mining analysis also are stored in the datawarehouse 165.

[0028] The data warehouse computer system 110 includes an automated datamining process 168 having a data warehouse upload process 170 and a datamining analysis process 172. The data warehouse upload process 170includes executable instructions for automatically extracting,transmitting and loading data from the transaction computer systems 120and 130 to the data warehouse computer system 110. The data mininganalysis process 172 includes executable instructions for triggering adata mining analysis in the data warehouse computer system 110, andenriching the data in the data warehouse 165 with new attributesdetermined by the data mining analysis, as described more fully below.

[0029] In some implementations, the data warehouse computer system 110also may include a data mining mart 174 that temporarily stores datafrom the data warehouse 165 for use in data mining. In such a case, thedata mining analysis process 172 also may extract data from the datawarehouse 165, store the extracted data to the data mining mart 174,trigger a data mining analysis that operates on the data from the datamining mart 174, and enrich the data in the data warehouse 165 with thenew attributes determined by the data mining analysis.

[0030] The data warehouse computer system 110 is capable of deliveringand exchanging data with the transaction computer systems 120 and 130through a wired or wireless communication pathway 176 and 178,respectively. The data warehouse computer system 110 also is able tocommunicate with the on-line client 115 that is connected to thecomputer system 110 through a communication pathway 176.

[0031] The data warehouse computer system 110, the transaction computersystems 120 and 130, and the on-line client 115 may be arranged tooperate within or in concert with one or more other systems, such as,for example, one or more LANs (“Local Area Networks”) and/or one or moreWANs (“Wide Area Networks”). The on-line client 115 may be ageneral-purpose computer that is capable of operating as a client of theapplication program (e.g., a desktop personal computer, a workstation,or a laptop computer running an application program), or a morespecial-purpose computer (e.g., a device specifically programmed tooperate as a client of a particular application program). The on-lineclient 115 uses communication pathway 182 to communicate with the datawarehouse computer system 110. For brevity, FIG. 1 illustrates only asingle on-line client 115 for system 100.

[0032] At predetermined times, the data warehouse computer system 110initiates an automated data mining process. This may be accomplished,for example, through the use of a task scheduler (not shown) thatinitiates the automated data mining process at a particular day andtime. In general, the automated data mining process 1) uses the datawarehouse upload process 170 to initiate the extraction, transformationand loading of data to the data warehouse 165 from the source systems120 and 130, and 2) uses the data mining analysis process 172 toinitiate a data mining run that creates new attributes by performing aspecial analysis of the data and loads the new attributes to the datawarehouse 165 without user manipulation. A particular automated datamining run may be scheduled as a recurring event based on the occurrenceof a predetermined time or date (such as the first day of a month, everySaturday at one o'clock a.m., or the first day of a quarter). Examplesof automated data mining processes are described more fully in FIGS.3-5.

[0033] More specifically, the data warehouse computer system 110 usesthe automated data mining process 168 to initiate the data warehouseupload process 170. The data warehouse upload process 170 extracts orcopies a portion of data, such as all or some of business objects 162,from the data storage 146 of the transaction computer system 120. Theextracted data is transmitted over the connection 176 to the datawarehouse computer system 110, where the extracted data are stored indata warehouse 165. The data warehouse computer system 110 also maytransform the extracted data from a format suitable to computer system110 into a different format that is suitable for the data warehousecomputer system 110. Similarly, the data warehouse computer system 110may extract a portion of data from data storage 154 of the computersystem 130, such as all or some of business objects 164, transmit theextracted data over connection 178, store the extracted data in the datawarehouse 165, and optionally transform the extracted data.

[0034] After the data have been extracted from the source computersystems (here, transaction computer systems 120 and 130), the automateddata mining process 168 initiates the data mining analysis 172. The datamining analysis 172 performs a particular data mining procedure toanalyze data from the data warehouse 165, enrich the data with newattributes, and store the enriched data in the data warehouse 165. Aparticular data mining procedure also may be referred to as a datamining run. There are different types of data mining runs. A data miningrun may be a training run in which data relationships are determined, aprediction run that applies a determined relationship to a collection ofdata relevant to a future event, such as a customer failing to renew aservice contract or make another purchase, or both a training run and aprediction run. The prediction run results in the creation of a newattribute for each business object in the data warehouse 165. Thecreation of a new attribute may be referred to as data enrichment. Forexample, when the data mining run predicts the likelihood that eachcustomer will churn, an attribute for the likelihood of churn for eachcustomer is stored in the data warehouse 165. That is, the datawarehouse 165 is enriched with the new attribute.

[0035] The combination of the data warehouse upload process 170 and thedata mining analysis 172 in the automated data mining process 168 mayincrease the coupling of the data mining with the upload of new data tothe data warehouse, which, in turn, may reduce the time until theresults of new data mining analyses are available. The combination ofthe data warehouse upload process 170 and the data mining analysis 172in the automated data mining process 168 also enable the use of the samemonitoring process to monitor both the data warehouse load process 170and the data mining analysis process 172, which, in turn, may helpsimplify the monitoring of the automated data mining process 168.

[0036] The data warehouse computer system 110 also includes a datawarehouse monitor 180 that reports on the administration of theautomated data mining process 168. For example, an end user of onlineclient 115 is able to view when an automated data mining process isscheduled to next occur, the frequency or other basis on which theautomated data mining process is scheduled, and the status of theautomated data mining process. For example, the end user may be able todetermine that the automated data mining process 168 is executing. Whenthe automated data mining process 168 is executing, the end user may beable to view the progress and status of each of the steps within thedata mining method. For example, the end user may be able to view thetime that the data warehouse upload process 168 was initiated. Theability to monitor the execution of the automated data mining processmay be useful to ensure that the automated data mining process 168 isoperating as desired. In some implementations, when a problem isdetected in the automation of a data mining process, a notification ofthe problem may be sent to an administrator for the data warehouse orother type of end user. The use of the data warehouse monitor 180 withboth the data upload process 170 and the data mining analysis 172 may beadvantageous. For example, a system administrator or another type ofuser need only access a single monitoring process (here, data warehousemonitor 180) to monitor both sub-processes (here, the data uploadprocess 170 and the data mining analysis 172). The use of the samemonitoring process for different sub-processes may result in consistentprocess behavior across the different sub-processes. The use of the samemonitoring process also may reduce the amount of training required forsystem administrators to be able to use the data warehouse monitor 180.

[0037] The ability to trigger special analyses directly after havingloaded new data in a data warehouse environment to enrich the newlyloaded data by new attributes may be useful. Multiple users, oftengeographically or organizationally distributed, are typicallyresponsible for performing different aspects of the process, all aspectsof which must be completed before the newly loaded data is enriched bythe special analyses. This may result in a delay from the time when thetransaction data is available for analysis to the time when the resultsof the analysis are available. The delay may be significant or maynegatively impact the business enterprise. For example, a businessenterprise may be harmed by lost sales by the delay of productarrangements in a retail store based on a cross-selling analysis or bythe delay of a promotional marketing campaign to target at-riskcustomers.

[0038]FIG. 2 shows the results 200 of enriching the data stored in thedata warehouse based on an automated data mining process. The results200 are stored in a relational database system that logically organizesdata into a database table. The database table arranges data associatedwith an entity (here, a customer) in a series of columns 210-216 androws 220-223. Each column 210, 211, 212, 213, 214, 215, or 216 describesan attribute of the customer for which data is being stored. Each row220, 221, 222 or 223 represents a collection of attribute values for aparticular customer number by a customer identifier 210. The attributes210-215 were extracted from a source system, such as a customerrelationship management system, and loaded into the data warehouse. Theattribute 216 represents the likelihood of churn for each customer 220,221, 222 and 223. The likelihood-of-churn attribute 216 was created andloaded into the data warehouse by an automated data mining process, suchas the automated data mining process described in FIGS. 1, 3 and 4.

[0039]FIG. 3 illustrates an automated data mining process 300. Theautomated data mining process 300 may be performed by a processor on acomputing system, such as data warehouse computer system 110 of FIG. 1.The automated data mining processor is directed by a method, script, orother type of computer program that includes executable instructions forperforming the automated data mining process 300. An example of such acollection of executable instructions is the automated data miningprocess 168 of FIG. 1.

[0040] The automated data mining process 300 includes an extract,transform and load (ETL) sub-process 310, a data mining sub-process 320,and a data enrichment sub-process 330. The automated data mining process300 begins at a predetermined time and date, typically a recurringpredetermined time and date. In some implementations, a systemadministrator or another type of user may manually initiate theautomated data mining process 300. In such a case, the automated datamining process 300, once initiated, automatically triggers sub-processes310, 320 and 330 without requiring further user manipulation.

[0041] For example, a churn management automated data mining process maybe associated with a script that includes a remote procedure call toextract data from one or more source systems in step 340, a computerprogram to transform the extracted data, a database script for loadingthe data warehouse with the transformed data, and a computer program toperform a churn analysis on the customer data in the data warehouse.Thus, once the script for the churn management automated data miningprocess has been initiated, by a task scheduler or other type ofcomputer program, the tasks are then automatically triggered based onthe completion of the previous script component.

[0042] The data warehouse processor extracts from a source systemappropriate data and transmits the extracted data to the data warehouse(step 340). For example, the data warehouse processor may execute aremote procedure call on the source system to trigger the extraction andtransmission of data from the source system to the computer system onwhich the data warehouse resides. Alternatively, the data warehouseprocessor may connect to a web service on the source system to requestthe extraction and transmission of the data. Typically, the data to beextracted is data from a transaction system, such as an OLTP system. Thedata extracted may be a complete set of the appropriate data (such asall sales orders or all customers) from the source system, or may beonly the data that has been changed since the last extraction. Theprocessor may extract and transmit the data from the source system in aseries of data groups, such as data blocks. The extraction may beperformed either as a background process or an on-line process, as maythe transmission. The ability to extract and transmit data in groups,extract and transmit only changed data, and extract and transmit as abackground process may collectively or individually be useful,particularly when a large volume of data is to be extracted andtransmitted.

[0043] In some implementations, the extracted data also may betransformed from the format used by the source system to a differentformat used by the data warehouse (step 345). The data transformationmay include transforming data values representing a particular attributeto a different field type and length that is used by the data warehouse.The data transformation also may include translating a data code used bythe source system to a corresponding but different data code used by thedata warehouse. For example, the source system may store a country valueusing a numeric code (for example, a “1” for the United States and a “2”for the United Kingdom) whereas the data warehouse may store a countryvalue as a textual abbreviation (for example, “U.S.” for the UnitedStates and “U.K.” for the United Kingdom). The data transformation alsomay include translating a proprietary key numbering system in whichprimary keys are created by sequentially allocating numbers within anallocated number range to a corresponding GUID (“globally uniqueidentifier”) key that is produced from a well-known algorithm and isable to be processed by any computer system using the well-knownalgorithm. The processor may use a translation table or other softwareengineering or programming techniques to perform the transformationsrequired. For example, the processor may use a translation table thattranslates the various possible values from one system to another systemfor a particular data attribute (for example, translating a country codeof “1” to “U.S.” and “2” to “U.K.” or translating a particularproprietary key to a corresponding GUID key).

[0044] Other types of data transformation also may be performed by thedata warehouse processor. For example, the processor may aggregate dataor generate additional data values based on the extracted data. Forexample, the processor may determine a geographic region for a customerbased on the customer's mailing address or may determine the totalamount of sales to a particular customer that is associated withmultiple sales orders.

[0045] The data warehouse processor loads the extracted data into datastorage associated with the data warehouse, such as the data warehouse165 of FIG. 1 (step 350). The data warehouse processor may execute acomputer program having executable instructions for loading theextracted data into the data storage and identified by the automateddata mining method directing the process 300. For example, a databasescript may be executed that includes database commands to load the datato the data warehouse. The use of a separate computer program forloading the data may increase the modularity of the data mining method,which, in turn, may improve the efficiency of modifying the automateddata mining process 300. Steps 340-350 may be referred to as the ETLsub-process 310.

[0046] After completing the ETL sub-process 310, the data warehouseprocessor automatically triggers a data mining process (step 360). Thismay be accomplished, for example, by using a script or other type ofcomputer program to control the execution of multiple programs.

[0047] The data warehouse processor performs a data mining run (step365). To do so, the data warehouse processor may apply a data miningmodel or another type of collection of data mining rules that definesthe type of analysis to be performed. The data mining model may beapplied to all or a portion of the data in the data warehouse. In someimplementations, the data warehouse processor may store the data to beused in the data mining run in transient or persistent storageperipheral to the data warehouse processor where the data is accessedduring the data mining run. This may be particularly advantageous whenthe data warehouse includes a very large volume of data and/or the datawarehouse also is used for OLAP processing. In some cases, the storageof the data to transient or persistent storage may be referred to asextracting or staging the data to a data mart for data mining purposes.

[0048] The data mining run may be a training run or a prediction run. Insome implementations, both a training run and a prediction run may beperformed during process 300. The results of the data mining run arestored in temporary storage. To do so, the data warehouse process maycopy the results stored in the temporary data structure to the datawarehouse. For example, in a customer churn analysis data miningprocess, the likelihood of churn for each customer may be assessed andstored in a temporary results data structure. Steps 360-365 may bereferred to as a data mining sub-process 320.

[0049] When the data mining sub-process 320 is completed, the datawarehouse processor stores the data mining results in the data warehouse(step 370). For example, a new column for the data mining results may beadded to a table in a relational data management system being used forthe date warehouse. In a customer churn analysis data mining process,the likelihood of churn for each customer may be added as a newattribute in the data warehouse and appropriately populated with thelikelihood data generated when the data mining run was performed in step365. The process of storing the data created by the data mining run inthe data warehouse may be referred to as a data enrichment sub-process330.

[0050] In one example, the process 300 may be used for an automatedcustomer-churn data mining process. A system administrator developscomputer programs, each of which are executed to accomplish a portion ofthe automated customer-churn data mining process. The systemadministrator also develops a script that identifies each of thecomputer programs to be executed and the order in which the computerprograms are to be executed to accomplish the automated customer-churndata mining process. The system administrator, using a task schedulingprogram schedules the automated customer-churn data mining script to betriggered on a monthly basis, such as on the first Saturday of eachmonth and beginning at one o'clock a.m.

[0051] At the scheduled time, the task scheduling program triggers thedata warehouse processor to execute the automated customer-churn datamining script. The data warehouse processor executes a remote procedurecall in a customer relationship management system to extract customerdata and transmit the data to the data warehouse computer system. Thedata warehouse computer system receives and stores the extractedcustomer data. The data warehouse processor executes a computer program,as directed by the executing automated customer-churn data miningprocess script, to transform the customer data to a format usable by thedata warehouse.

[0052] The data warehouse processor continues to execute the automatedcustomer-churn data mining process script, which then triggers a datamining training run to identify hidden relationships within the customerdata. Specifically, the characteristics of customers who have notrenewed a service contract in the last eighteen months are identified.The characteristics identified may include, for example, an income aboveor below a particular level, a geographic region in which thenon-returning customer resides, the types of service contract that werenot renewed, and the median age of a non-renewing customer.

[0053] The data warehouse processor then, under the continued directionof the automated customer-churn data mining process script, triggers adata mining prediction run to identify particular customers who are atrisk of not renewing a service contract, the prediction is made based onthe customer characteristics identified in the data mining training run.The data warehouse processor determines a likelihood-of-churn for eachcustomer. The data warehouse is enriched with the likelihood-of-churnfor each customer such that a likelihood-of-churn attribute is added tothe customer data in the data warehouse and the likelihood-of-churnvalue for each value is stored in the new attribute.

[0054] In some implementations, when a subsequent likelihood-of-churnvalue for a customer is determined, such as a likelihood-of-churn valuefor a customer that is determined in the following month, thelikelihood-of-churn value from the previous data mining prediction runmay be replaced so that a customer has only one likelihood-of-churnvalue at any time. In contrast, some implementations may store the newlikelihood-of-churn value each month, in addition to a previous valuefor the likelihood-of-churn, to develop a time-dependent prediction—thatis, a new prediction for the same type of prediction is stored each timea prediction run is performed for a customer. The time-dependentprediction may help improve the accuracy of the data mining trainingruns because the predicted values may be monitored over time andcompared with actual customer behavior.

[0055]FIG. 4 illustrates another example of an automated data miningprocess. In contrast to the automated data mining process 300 of FIG. 3,automated data mining process 400 replicates data from a data warehouse,such as data warehouse 165 in FIG. 1, to a data mining mart, such asdata mining mart 174 of FIG. 1. The data mining process 400 thenperforms the data mining analysis on data in the data mart, and storesthe data mining results as enriched data in the data warehouse.

[0056] The automated data mining process 400 may be performed by aprocessor on a computing system, such as data warehouse computer system110 of FIG. 1. The automated data mining processor is directed by amethod, script, or other type of computer program that includesexecutable instructions for performing the automated data mining process400. An example of such a collection of executable instructions is theautomated data mining process 168 of FIG. 1.

[0057] The automated data mining process 400 includes an extract,transform and load (ETL) sub-process 410, a data mining sub-process 420that uses a data mart, and a data enrichment sub-process 430. Theautomated mining process 400 begins at a predetermined time and date,typically a recurring predetermined time and date. The ETL sub-process410 extracts data from a transactional processing or other type ofsource system and loads the data to a data warehouse, as describedpreviously with respect to ETL sub-process 310 of FIG. 3.

[0058] After completing the ETL sub-process 410, the data warehouseprocessor automatically triggers a data mining run, as described withrespect to step 360 in FIG. 3 (step 440). The data warehouse processorcopies data from the data warehouse to the data mining mart for use in adata mining run (step 450). For example, when the data warehouse and thedata mining mart are located on the same computer system, the datawarehouse processor may insert into database tables of a data miningmart a copy of some of the data rows stored in the data warehouse.Alternatively, when the data warehouse is located on a differentcomputer system than the computer system on which the data mart islocated, the data warehouse processor may extract data from the datawarehouse on a computer system and transmit the data to the data martlocated on a different computer system. The data warehouse processorthen may execute a remote procedure call or other collection ofexecutable instructions to load data into the data mart. In someimplementations, the data warehouse processor may replicate data fromthe data warehouse to the data mining mart—that is, the data warehouseprocessor copies the data to the data mining mart and synchronizes thedata mining mart with the data warehouse such that changes made to oneof the data warehouse or the data mining mart are reflected in all otherof the data warehouse or the data mining mart. In some implementations,the data warehouse processor may transform the data from the datawarehouse before storing the data in the data mining mart.

[0059] The data warehouse processor then performs a data mining run, asdescribed in step 365 in FIG. 3, using data in the data mining mart(step 460). The steps 440-460 may be referred to as a data miningsub-process 420. When the data mining sub-process 420 is completed, thedata warehouse processor stores the data mining results in the datawarehouse (step 470), as described in step 370 and sub-process 330 inFIG. 3.

[0060]FIG. 5 depicts the components of a software architecture 500 foran automated data mining process. The software architecture 500 may beused to implement the automated data mining process 300 described inFIG. 3 or the automated data mining process 400 described in FIG. 4. Thesoftware architecture 500 may be implemented, for example, on computersystem 110 of FIG. 1. FIG. 5 also illustrates a data flow and a processflow using the components of the software architecture to implement theautomated data mining process 400 in FIG. 4.

[0061] The software architecture 500 includes an automated data miningtask scheduler 510, a transaction data extractor 515, and a data miningextractor 520. The software architecture also includes a transactionprocessing data management system 525 for a transaction processingsystem, such as transaction computer system 120 or transaction computersystem 130 in FIG. 1. The software architecture also includes a datawarehouse 530, such as the data warehouse 165 in FIG. 1, and a data mart535, such as the optional data mart 174 in FIG. 1.

[0062] One example of the automated data mining task scheduler 510 is aprocess chain for triggering the transaction data extractor 515 and thedata mining extractor 520 at a predetermined date and time. In general,a process chain is a computer program that defines particular tasks thatare to occur in a particular order at a predetermined date and time. Forexample, a system administrator or another type of user may schedule theprocess chain to occur at regular intervals, such as at one o-clock a.m.the first Saturday of a month, every Sunday at eight o'clock a.m., or attwo o'clock a.m. on the first day and the fifteenth day of each month. Aprocess chain may include dependencies between the defined tasks in theprocess chain such that a subsequent task is not triggered until aprevious task has been successfully completed. In this example, theautomated data mining task scheduler 510 is a process chain that callstwo extractor processes: the transaction data extractor 515 and the datamining extractor 520. The data mining extractor 520 is only initiatedafter the successful completion of the transaction data extractor 515.

[0063] The automated data mining task scheduler 510 starts thetransaction data extractor 515 at a predetermined date and time, asillustrated by process flow 542. In general, an extractor is a computerprogram that performs the extraction of data from a data source using aset of predefined settings. Typical settings for an extractor includedata selection settings that identify the particular data attributes anddata filter settings that identify the criteria that identifies theparticular records to be extracted. For example, an extractor mayidentify three attributes—customer number, last purchase date, andamount of last purchase—that are to be extracted for all customers thatare located in a particular geographic region. The extractor then readsthe attribute values for the records that meet the filter condition fromthe data source, maps the data to the attributes included in the datawarehouse, and loads the data to the data warehouse. An extractor alsomay be referred to as an upload process.

[0064] The transaction data extractor 515 extracts, using predefinedsettings, data from the transaction processing data management system,as indicated by data flow line 544, and transforms the data as necessaryto prepare the data to be loaded to the data warehouse 530. Thetransaction data extractor 515 then loads the extracted data to the datawarehouse 530, as indicated by data flow 546. After the extracted datahas been loaded, the transaction data extractor 515 returns processingcontrol to the automated data mining task scheduler 510, as indicated byprocess flow 548. When returning processing control, the transactiondata extractor 515 also reports the successful completion of theextraction.

[0065] Based on the successful completion of the transaction dataextractor 515, the automated data mining task scheduler 510 starts thedata mining extractor 520, as illustrated by process flow 552. Ingeneral, the data mining extractor initiates a data mining process usingthe newly loaded transaction data in the data warehouse 530. The datamining process analyzes the data and writes the results back to the datawarehouse.

[0066] First, the data mining extractor 520 extracts data from the datawarehouse 530 (function 555), as illustrated by data flow 556, and loadsthe extracted data to the data mart 535, as illustrated by data flow558, for use by the data mining analysis. The data mining extractor 520then performs a data mining training analysis (function 560) using thedata from the data mart 535, as illustrated by data flow 562. The datamining extractor 520 updates the appropriate data mining model in datamining model 565 with the results of the data mining training analysis,as illustrated by data flow 564.

[0067] The data mining extractor 520 uses the results of the data miningtraining analysis from a data mining model 564, as illustrated by dataflow 566, to perform a data mining prediction analysis (function 568).The data mining extractor 520 stores the results of the data miningprediction analysis in the data mart 535, as illustrated by data flow569.

[0068] The data mining extractor 520 then performs a data enrichmentfunction (function 570) using the results from the data mart 535, asillustrated by data flow 572, to load the data mining results into thedata warehouse 530, as illustrated by data flow 574. After enriching thedata warehouse 530 with the data mining analysis results, the datamining extractor 520 returns processing control to the automated datamining task scheduler 510, as depicted by process flow 576. Whenreturning processing control, the data mining extractor 520 also reportsto the automated data mining task scheduler 510 the successfulcompletion of the data mining analyses and enrichment of the datawarehouse. To do so, the data mining extractor 520 may report a returncode that is consistent with a successful process.

[0069] The use of a task scheduler, here in the form of a process chain,to link the task of extracting the transaction data from a source systemwith the task of performing the data mining process may be useful. Forexample, the process for loading transaction data to the data warehouseis combined with an immediate data mining analysis and enrichment of thedata warehouse data with the results of the analysis. The linkage of thetransactional data availability with the automatic performance of thedata mining analysis may reduce, perhaps even substantially reduce, thelag between the time at which the transaction data first becomesavailable in the data warehouse and the time at which the data enrichedwith data mining analysis results becomes available in the datawarehouse.

[0070] There also may be advantages in a type of data loading computerprogram (here, an extractor) for both (1) the load of the transactiondata to the data warehouse and (2) the performance of the data mininganalysis and the enrichment of the data warehouse data with the datamining analysis results. This may be particularly true when a data martis used for temporary storage of data from the data warehouse in whichan extraction is to be performed. For example, in some data warehousingsystems, a task scheduler may be available only for use with a dataloading process and may not be available for general use with a datamining process. In such a case, wrapping the data mining process withina data loading process allows a data mining process to be automaticallytriggered at a predetermined time on a scheduled basis (such as daily,weekly or monthly at a particular time).

[0071] More generally, the use of the same types of techniques,procedures and processes for both a data extraction process and ananalytical process of data mining run may be useful. For example, it mayenable the use of a common software tool for administering a datawarehouse and a data mining run, particularly when data is extractedfrom a data warehouse for use by a data mining run. The use of the sametechniques, procedures and processes for both a data extraction processand an analytical process also may make a function available to bothprocesses when the function was previously available only to one of theanalytical process or the data extraction process. It also may encourageconsistent behavior from a data warehouse process and a data mininganalysis, which may, in turn, reduce the amount of training required bya system administrator.

[0072]FIG. 6 depicts a process 600 supported by a data mining workbenchfor defining an automated data mining process. The data mining workbenchpresents a user interface to guide a user to define a particular type ofautomated data mining analysis. In general, the data mining workbenchuses a generic template for a particular type of data mining analysis,receives user-entered information applicable to the generic template,receives scheduling information from the user, and generates aparticular automated data mining process.

[0073] The process 600 to define an automated data mining process beginswhen the data mining workbench presents a user interface for the user toenter identifying information for the data mining analysis process beingdefined (step 610). For example, the user may enter a name or anothertype of identifier and a description of the data mining analysis.

[0074] The data mining workbench then presents an interface that allowsa user to identify the data mining analysis template to be used (step620). For example, the data mining workbench may present a list of datamining analysis templates, such as a template for particular type of acustomer loyalty analysis or a template for a particular type ofcross-selling analysis, from which the user selects.

[0075] Based on the data mining analysis template selected, the datamining workbench presents an appropriate interface to guide the userthrough the process of entering the user-configuration data mininginformation to configure the template for the particular analysis beingdefined (step 630). In one example of defining an automated data mininganalysis for determining the effect of a particular marketing campaign,the user enters an identifier for the particular marketing campaign tobe analyzed, the particular customer attributes to be analyzed, theattributes to be measured to determine the effect of the marketingcampaign (such as sales attribute), and the filter criteria forselecting the records to be analyzed. The data mining analysis templateincludes a portion for the transaction data extraction, such astransaction data extractor 515 in FIG. 5, and a portion for data miningextraction, such as data mining extractor 520 in FIG. 5.

[0076] The user then schedules when the data mining analysis processshould be automatically triggered (step 640). For example, the user mayidentify a recurring pattern of dates and times for triggering the datamining analysis. This may be accomplished through the presentation of acalendar or the presentation of a set of schedule options from which theuser selects.

[0077] The data mining workbench then stores a version of the genericdata mining analysis template with the user-entered information (step650). To do so, for example, the data mining workbench may use the nameor identifier entered by the user as the name of the stored automateddata mining process. The automated data mining process may be added to atask scheduler and scheduled based on the information the user entered.

[0078] Although the techniques and concepts described above refer to asingle data mining process, the applicability of the techniques andconcepts is not limited to a single data mining process. For example, aparticular data warehouse may be used for, and typically is used for,many different data mining processes, many of which may benefit frombeing automated as described herein.

[0079] A number of implementations of the invention have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention.Accordingly, other implementations are within the scope of the followingclaims.

What is claimed is:
 1. A method for conducting a data mining process,the method comprising: triggering the data mining process wherein: thedata mining process includes an analytical process, the triggering isbased on the presence of data in a data source that is used foranalytical processing, and the analytical process uses a procedure thatalso is usable in a data extraction process; creating a data attributeby performing the analytical process on data from the data source afterthe analytical process has been triggered; and storing the created dataattribute in the data source.
 2. The method of claim 1 furthercomprising extracting data from a data source used for transactionprocessing.
 3. The method of claim 2 wherein a person initiates at mostthe step of extracting data from the data source used for transactionprocessing.
 4. The method of claim 2 further comprising loading theextracted data to the data source that is used for analyticalprocessing.
 5. The method of claim 4 wherein a person initiates at mostthe step of loading the extracted data.
 6. The method of claim 1 furthercomprising: extracting data from the data source that is used foranalytical processing; and loading the data extracted from the datasource that is used for analytical processing to temporary data storage,wherein performing the analytical process comprises performing theanalytical process using data stored in the temporary data storage. 7.The method of claim 1 wherein triggering the analytical process basedpresence of data in the data source that is used for analyticalprocessing comprises triggering an analytical process based on thecompletion of a computer program for loading data to the data sourcethat is used for analytical processing.
 8. The method of claim 1 furthercomprising triggering, based on an occurrence of a predetermined dateand time, loading the data extracted from a data source used fortransaction processing to the data source that is used for analyticalprocessing.
 9. The method of claim 1 further comprising triggering,based on an occurrence of a predetermined date and time, extracting datafrom a data source used for transaction processing and loading theextracted data to the data source that is used for analytical processing10. The method of claim 1 wherein performing the analytical processcomprises determining a relationship between two data values in the datasource.
 11. The method of claim 10 wherein performing the analyticalprocess comprises determining a relationship between two data valuesthat predict a likelihood of whether a particular customer will fail topurchase a service or product in the future.
 12. The method of claim 10wherein performing the analytical process comprises identifying productsthat are purchased in the same transaction.
 13. The method of claim 10wherein performing the analytical process comprises identifying servicesthat are purchased in the same transaction.
 14. The method of claim 10wherein performing the analytical process comprises applying apreviously-determined relationship between two data values to data inthe data source.
 15. The method of claim 14 wherein applying apreviously-determined relationship comprises determining a likelihood ofwhether a particular customer will fail to purchase a service or aproduct in the future based on characteristics associated with customerswho have been identified as failing to purchase a service or a product.16. The method of claim 1 performing the analytical process comprisesdetermining a relationship between two data values in the data sourceand applying the determined relationship to data in the data source. 17.The method of claim 16 wherein: determining a relationship between twodata values in the data source comprises determining a relationshipbetween two data values that predict a likelihood of whether aparticular customer will fail to purchase a service or product in thefuture, and applying the determined relationship to data in the datasource comprises determining a likelihood of whether a particularcustomer will fail to purchase a service or a product in the futurebased on the relationship determined between two data values thatpredict a likelihood of whether a particular customer will fail topurchase a service or product in the future.
 18. A method for conductinga data mining process, the method comprising: triggering the data miningprocess wherein: the data mining process includes an analytical process,the triggering is based on the presence of data in a data source that isused for analytical processing, and the analytical process uses aprocedure that also is usable in a data extraction process; extractingdata from the data source that is used for analytical processing;loading the data extracted from the data source that is used foranalytical processing to temporary data storage, creating a dataattribute by performing the analytical process on data in the temporarydata storage; and storing the created data attribute in the data sourcethat is used for analytical processing.
 19. The method of claim 18further comprising: extracting data from a data source used fortransaction processing, and comprising loading the extracted data to thedata source that is used for analytical processing.
 20. The method ofclaim 19 wherein a person initiates at most the step of extracting datafrom the data source used for transaction processing.
 21. The method ofclaim 18 wherein performing the analytical process comprises determininga relationship between two data values in the data source.
 22. Acomputer-readable medium or propagated signal having embodied thereon acomputer program configured to conduct a data mining process, the mediumor signal comprising one or more code segments configured to: triggerthe data mining process wherein: the data mining process includes ananalytical process, the triggering is based on the presence of data in adata source that is used for analytical processing, and the analyticalprocess uses a procedure that also is usable in a data extractionprocess; create a data attribute by performing the analytical process ondata from the data source after the analytical process has beentriggered; and store the created data attribute in the data source. 23.The medium or signal of claim 22 wherein the one or more code segmentsare further configured to: extract data from the data source that isused for analytical processing; and load the data extracted from thedata source that is used for analytical processing to temporary datastorage, wherein the one or more code segments configured to perform theanalytical process comprise one or more code segments configured toperform the analytical process using data stored in the temporary datastorage.
 24. The medium or signal of claim 22 wherein the one or morecode segments configured to trigger the analytical process comprise oneor more code segments configured to trigger an analytical process basedon the completion of a computer program for loading data to the datasource that is used for analytical processing.
 25. The medium or signalof claim 22 wherein the one or more code segments are further configuredto trigger, based on an occurrence of a predetermined date and time,loading the data extracted from a data source used for transactionprocessing to the data source that is used for analytical processing.26. The medium or signal of claim 22 wherein the one or more codesegments are further configured to trigger, based on an occurrence of apredetermined date and time, extracting data from a data source used fortransaction processing and loading the extracted data to the data sourcethat is used for analytical processing.
 27. The medium or signal ofclaim 22 wherein the one or more code segments configured to perform theanalytical process comprise one or more code segments configured todetermine a relationship between two data values in the data source. 28.A system for conducting a data mining process, the system comprising aprocessor connected to a storage device and one or more input/outputdevices, wherein the processor is configured to: trigger the data miningprocess wherein: the data mining process includes an analytical process,the triggering is based on the presence of data in a data source that isused for analytical processing, and the analytical process uses aprocedure that also is usable in a data extraction process; create adata attribute by performing the analytical process on data from thedata source after the analytical process has been triggered; and storethe created data attribute in the data source.
 29. The system of claim28 wherein the processor is further configured to: extract data from thedata source that is used for analytical processing; load the dataextracted from the data source that is used for analytical processing totemporary data storage; and perform the analytical process using datastored in the temporary data storage.
 30. The system of claim 28 whereinthe processor is configured to trigger an analytical process based onthe completion of a computer program for loading data to the data sourcethat is used for analytical processing.
 31. The system of claim 28wherein the processor is further configured to trigger, based on anoccurrence of a predetermined date and time, loading the data extractedfrom a data source used for transaction processing to the data sourcethat is used for analytical processing.
 32. The system of claim 28wherein the processor is further configured to trigger, based on anoccurrence of a predetermined date and time, extracting data from a datasource used for transaction processing and loading the extracted data tothe data source that is used for analytical processing.
 33. The systemof claim 28 wherein the processor is configured to determine arelationship between two data values in the data source.
 34. A methodfor defining an automated data mining process, the method comprising:presenting a user interface for: identifying a template for a type ofautomated data mining process for triggering an analytical process, theanalytical process using a procedure that also is usable in a dataextraction process, based on the presence of data in a data source thatis used for analytical processing, creating a data attribute byperforming the analytical process on data from the data source after theanalytical process has been triggered, and storing the created dataattribute in the data source; and entering information for defining theautomated data mining process; associating the entered information withthe identified template; and storing the associated entered informationwith the identified template as a computer program configured to performthe automated data mining process.